145 research outputs found
Can electoral popularity be predicted using socially generated big data?
Today, our more-than-ever digital lives leave significant footprints in
cyberspace. Large scale collections of these socially generated footprints,
often known as big data, could help us to re-investigate different aspects of
our social collective behaviour in a quantitative framework. In this
contribution we discuss one such possibility: the monitoring and predicting of
popularity dynamics of candidates and parties through the analysis of socially
generated data on the web during electoral campaigns. Such data offer
considerable possibility for improving our awareness of popularity dynamics.
However they also suffer from significant drawbacks in terms of
representativeness and generalisability. In this paper we discuss potential
ways around such problems, suggesting the nature of different political systems
and contexts might lend differing levels of predictive power to certain types
of data source. We offer an initial exploratory test of these ideas, focussing
on two data streams, Wikipedia page views and Google search queries. On the
basis of this data, we present popularity dynamics from real case examples of
recent elections in three different countries.Comment: To appear in Information Technolog
The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics
Activity of modern scholarship creates online footprints galore. Along with
traditional metrics of research quality, such as citation counts, online images
of researchers and institutions increasingly matter in evaluating academic
impact, decisions about grant allocation, and promotion. We examined 400
biographical Wikipedia articles on academics from four scientific fields to
test if being featured in the world's largest online encyclopedia is correlated
with higher academic notability (assessed through citation counts). We found no
statistically significant correlation between Wikipedia articles metrics
(length, number of edits, number of incoming links from other articles, etc.)
and academic notability of the mentioned researchers. We also did not find any
evidence that the scientists with better WP representation are necessarily more
prominent in their fields. In addition, we inspected the Wikipedia coverage of
notable scientists sampled from Thomson Reuters list of "highly cited
researchers". In each of the examined fields, Wikipedia failed in covering
notable scholars properly. Both findings imply that Wikipedia might be
producing an inaccurate image of academics on the front end of science. By
shedding light on how public perception of academic progress is formed, this
study alerts that a subjective element might have been introduced into the
hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and
Datasets e-mail the corresponding autho
Mining public opinion: why unsuccessful online petitions should not be ignored
Taha Yasseri argues that by analysing online petition data using computational techniques, politicians can glean fresh insights about the geographic factors influencing constituents’ concerns, the dynamics at play over time, as well as a deeper awareness of the issues most important to the general public
Modeling the Rise in Internet-based Petitions
Contemporary collective action, much of which involves social media and other
Internet-based platforms, leaves a digital imprint which may be harvested to
better understand the dynamics of mobilization. Petition signing is an example
of collective action which has gained in popularity with rising use of social
media and provides such data for the whole population of petition signatories
for a given platform. This paper tracks the growth curves of all 20,000
petitions to the UK government over 18 months, analyzing the rate of growth and
outreach mechanism. Previous research has suggested the importance of the first
day to the ultimate success of a petition, but has not examined early growth
within that day, made possible here through hourly resolution in the data. The
analysis shows that the vast majority of petitions do not achieve any measure
of success; over 99 percent fail to get the 10,000 signatures required for an
official response and only 0.1 percent attain the 100,000 required for a
parliamentary debate. We analyze the data through a multiplicative process
model framework to explain the heterogeneous growth of signatures at the
population level. We define and measure an average outreach factor for
petitions and show that it decays very fast (reducing to 0.1% after 10 hours).
After 24 hours, a petition's fate is virtually set. The findings seem to
challenge conventional analyses of collective action from economics and
political science, where the production function has been assumed to follow an
S-shaped curve.Comment: Submitted to EPJ Data Scienc
Topic Modelling of Everyday Sexism Project Entries
The Everyday Sexism Project documents everyday examples of sexism reported by
volunteer contributors from all around the world. It collected 100,000 entries
in 13+ languages within the first 3 years of its existence. The content of
reports in various languages submitted to Everyday Sexism is a valuable source
of crowdsourced information with great potential for feminist and gender
studies. In this paper, we take a computational approach to analyze the content
of reports. We use topic-modelling techniques to extract emerging topics and
concepts from the reports, and to map the semantic relations between those
topics. The resulting picture closely resembles and adds to that arrived at
through qualitative analysis, showing that this form of topic modeling could be
useful for sifting through datasets that had not previously been subject to any
analysis. More precisely, we come up with a map of topics for two different
resolutions of our topic model and discuss the connection between the
identified topics. In the low resolution picture, for instance, we found Public
space/Street, Online, Work related/Office, Transport, School, Media harassment,
and Domestic abuse. Among these, the strongest connection is between Public
space/Street harassment and Domestic abuse and sexism in personal
relationships.The strength of the relationships between topics illustrates the
fluid and ubiquitous nature of sexism, with no single experience being
unrelated to another.Comment: preprint, under revie
Female scholars need to achieve more for equal public recognition
Different kinds of "gender gap" have been reported in different walks of the
scientific life, almost always favouring male scientists over females. In this
work, for the first time, we present a large-scale empirical analysis to ask
whether female scientists with the same level of scientific accomplishment are
as likely as males to be recognised. We particularly focus on Wikipedia, the
open online encyclopedia that its open nature allows us to have a proxy of
community recognition. We calculate the probability of appearing on Wikipedia
as a scientist for both male and female scholars in three different fields. We
find that women in Physics, Economics and Philosophy are considerable less
likely than men to be recognised on Wikipedia across all levels of achievement.Comment: Under revie
Wikipedia traffic data and electoral prediction: towards theoretically informed models
This aim of this article is to explore the potential use of Wikipedia page
view data for predicting electoral results. Responding to previous critiques of
work using socially generated data to predict elections, which have argued that
these predictions take place without any understanding of the mechanism which
enables them, we first develop a theoretical model which highlights why people
might seek information online at election time, and how this activity might
relate to overall electoral outcomes, focussing especially on how different
types of parties such as new and established parties might generate different
information seeking patterns. We test this model on a novel dataset drawn from
a variety of countries in the 2009 and 2014 European Parliament elections. We
show that while Wikipedia offers little insight into absolute vote outcomes, it
offers a good information about changes in both overall turnout at elections
and in vote share for particular parties. These results are used to enhance
existing theories about the drivers of aggregate patterns in online information
seeking.Comment: submitted to EPJ Data Science. Additional File 1 available at
https://drive.google.com/open?id=0BxaGC-YCTO6SWkJhRXlrMVRYVl
Gender Imbalance and Spatiotemporal Patterns of Contributions to Citizen Science Projects: The Case of Zooniverse
Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a first overview of spatiotemporal and gender distribution of citizen science workforce by analyzing 54 million classifications contributed by more than 340 thousand citizen science volunteers from 198 countries to one of the largest online citizen science platforms, Zooniverse. First we report on the uneven geographical distribution of the citizen scientist and model the variations among countries based on the socio-economic conditions as well as the level of research investment in each country. Analyzing the temporal features of contributions, we report on high “burstiness” of participation instances as well as the leisurely nature of participation suggested by the time of the day that the citizen scientists were the most active. Finally, we discuss the gender imbalance among online citizen scientists (about 30% female) and compare it with other collaborative projects as well as the gender distribution in more formal scientific activities. Online citizen science projects need further attention from outside of the academic community, and our findings can help attract the attention of public and private stakeholders, as well as to inform the design of the platforms and science policy making processes
Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods
Massive Open Online Courses (MOOCs) offer unprecedented opportunities to
learn at scale. Within a few years, the phenomenon of crowd-based learning has
gained enormous popularity with millions of learners across the globe
participating in courses ranging from Popular Music to Astrophysics. They have
captured the imaginations of many, attracting significant media attention -
with The New York Times naming 2012 "The Year of the MOOC." For those engaged
in learning analytics and educational data mining, MOOCs have provided an
exciting opportunity to develop innovative methodologies that harness big data
in education.Comment: Preprint of a chapter to appear in "Data Mining and Learning
Analytics: Applications in Educational Research
Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis
Have we become more tolerant of dating people of different social backgrounds
compared to ten years ago? Has the rise of online dating exacerbated or
alleviated gender inequalities in modern courtship? Are the most attractive
people on these platforms necessarily the most successful? In this work, we
examine the mate preferences and communication patterns of male and female
users of the online dating site eHarmony over the past decade to identify how
attitudes and behaviors have changed over this time period. While other studies
have investigated disparities in user behavior between male and female users,
this study is unique in its longitudinal approach. Specifically, we analyze how
men and women differ in their preferences for certain traits in potential
partners and how those preferences have changed over time. The second line of
inquiry investigates to what extent physical attractiveness determines the rate
of messages a user receives, and how this relationship varies between men and
women. Thirdly, we explore whether online dating practices between males and
females have become more equal over time or if biases and inequalities have
remained constant (or increased). Fourthly, we study the behavioural traits in
sending and replying to messages based on one's own experience of receiving
messages and being replied to. Finally, we found that similarity between
profiles is not a predictor for success except for the number of children and
smoking habits. This work could have broader implications for shifting gender
norms and social attitudes, reflected in online courtship rituals. Apart from
the data-based research, we connect the results to existing theories that
concern the role of ICTs in societal change. As searching for love online
becomes increasingly common across generations and geographies, these findings
may shed light on how people can build relationships through the Internet.Comment: Preprint, under revie
- …